COVID-19 Outbreak Spread Analytics - Peru
Before we begin please note anything bolded is either a finding or an important comment
Introduction
This blog illustrates the use of R to analyze COVID-19 Outbreak in Peru by applying different algorithms in an attempt to gauge the spread of the virus and success of public health containment efforts in reducing the speed and degree of spread.
For this analysis, I’m laboring Tim Churche’s blog which containings a detailed analysis of COVID-19 data for China.
This analysis is for informational purposes only and should not be relied upon for to inform governmental or health care policy.
Data sources
Data utilized for this analysis comes from Coronavirus wiki, Peruvian Ministry of Health MINSA, Outbreak Dashboard and nCOV19 package. The use of these pages through web scraping can be challenging, however, because the format of the tables in which the data appears changes many times per day.
Data limitations
The biggest problem with these analyses is that they are not based on data tabulated by date of onset (of symptoms), but rather on data tabulated by date of reporting, or perhaps date of confirmation by lab test. We understand that lab results from molecular tests can take up to 7 days to be published while lab results from serological tests are not reliable due that test looks for antibodies within the blood stream which are usually developed from 7 to 30 days after contracting the virus.
If lab testing and reporting is swift and the lag is uniform, then the date of reporting or confirmation will lag consistently and only slightly behind the date of onset. Sadly that’s not usually the case, there are tipacally variable delays in lab confirmation and/or reporting, leading to backlogs of cases being reported all at once (known as a “data dump” by outbreak epidemiologists). This is very unfortunate, and it frustrates efforts to properly assess outbreak trajectories and effectiveness of intervention strategies. One solution would be for authorities to provide line-listings of all cases, including their date of reporting and their date of onset of symptoms, even if only presumptive dates are provided (and missing dates of onset can be imputed, in any case).
The effect of the unavoidable (because that’s all that is available) use of data tabulated by date of reporting here (and just about everywhere else) means that reproduction numbers and related measures of infectiousness may be inflated, particularly at first. Nonetheless, the methods illustrated here show the potential utility of epidemic trajectory analysis – but that utility would be so much greater if health authorities published data tabulated by date of onset, or better, line-listings that include date of onset. It would be very easy for them to do that.
Data Update
This initial analysis is meant to visualize and forecast COVID-19 spread in Peru.
Peru COVID-19 dataset was updated on: 17Apr20
Analysis methods
The analysis presented here primarily uses R packages developed and published by the R Epidemics Consortium (RECON).
The functions provided by the RECON incidence package are used to fit log-linear models to the growth and decay phases of the outbreak in Peru, and the decay phase model is used to very approximately predict when local transmission will be extinguished there.
I also utilize earlyR and EpiEstim packages, which are also published by RECON. In particular the get_R() function in earlyR calculates a maximum-likelihood estimate (from a distribution of likelihoods for the reproduction number) as well as calculating \(\lambda\), which is a relative measure of the current “force of infection”:
\[ \lambda = \sum_{s=1}^{t-1} {y_{s} w (t - s)} \]
where \(w()\) is the probability mass function (PMF) of the serial interval, and \(y_s\) is the incidence at time \(s\).
The resulting \(\lambda\) “force of infection” plot indicates the daily effective infectiousness (subject to public health controls), with a projection of the diminution of the force of infection if no further cases are observed. The last date of observed data is indicated by the vertical blue dashed line. New cases are shown in a cumulative manner as black dots. It is a sign that the outbreak is being brought under control if \(\lambda\), as indicated by the orange bars, is falling prior to or at the date of last observation (as indicated by the vertical blue line). Note that left of the vertical blue line the \(\lambda\) values are projections, valid only in no further cases are observed. As such, the plot is a bit confusing, but it is nonetheless useful if interpreted with this explanation in mind. The RECON packages are all open-source, and easier-to-interpret plots of \(\lambda\) could readily be constructed.
The critical parameter for these calculation is the distribution of serial intervals (SI), which is the time between the date of onset of symptoms for a case and the dates of onsets for any secondary cases that case gives rise to. Typically a discrete \(\gamma\) distribution for these serial intervals is assumed, parameterised by a mean and standard deviation, although more complex distributions are probably more realistic.
For the estimation of the force of infection \(\lambda\), for the serial interval we’ll use a discrete \(\gamma\) distribution with a mean of 5.0 days and a standard deviation of 3.4.
Data Understanding
After having an understanding on what is our business problem, initial datasets are loaded for data preparation and exploration prior model development.
Libraries & Functions
Let’s begin by loading the libraries and functions that will be used for this analysis:
## plyr dplyr data.table
## TRUE TRUE TRUE
## readxl tibble deSolve
## TRUE TRUE TRUE
## stringr stringi forecast
## TRUE TRUE TRUE
## tidyverse broom gt
## TRUE TRUE TRUE
## caret zoo plotly
## TRUE TRUE TRUE
## DT rpart tidyr
## TRUE TRUE TRUE
## rvest hrbrthemes inspectdf
## TRUE TRUE TRUE
## DataExplorer ggplot2 tsoutliers
## TRUE TRUE TRUE
## ggthemes dlm PerformanceAnalytics
## TRUE TRUE TRUE
## projections earlyR ggdendro
## TRUE TRUE TRUE
## rgdal tsibble EpiEstim
## TRUE TRUE TRUE
## epitrix distcrete shadowtext
## TRUE TRUE TRUE
## nCov2019 magrittr incidence
## TRUE TRUE TRUE
Data Collection
Raw data for each Peru COVID-19 dataset are loaded.
Summary
The COVID-19 dataset (COVID_19_Peru_Raw) has 43 rows and 12 columns while COVID-19 dataset case increase every 2 days has 22 rows and 3 columns
Worldwide Analytics
Please see below for COVID-19 Outbreak Worldwide Analytics updated on 16Apr20.
Outbreak trend since 100th cases
Here we observe Worldwide Outbreak trend after the 100th reported possitive case:
Geographic Cumulative Cases
Below we find a geographic heatmap based on possitive confirmed cases up to date:
Epidemic Heatmap for top 25 countries on the last 7 days
Heatmap of epidemic situation around the world in the last 7 days (log scale) for top 25 countries with higher number of possitive cases.
Peru Analytics
Exploratory Data Analysis (EDA)
After raw data is loaded, I proceed to perform EDA on dataset.
Data Cleaning
For this step, I perform a data cleaning if needed.
## # A tibble: 1 x 9
## rows columns discrete_columns continuous_colu~ all_missing_col~
## <int> <int> <int> <int> <int>
## 1 43 12 1 11 0
## # ... with 4 more variables: total_missing_values <int>,
## # complete_rows <int>, total_observations <int>, memory_usage <dbl>
## # A tibble: 1 x 9
## rows columns discrete_columns continuous_colu~ all_missing_col~
## <int> <int> <int> <int> <int>
## 1 22 3 1 2 0
## # ... with 4 more variables: total_missing_values <int>,
## # complete_rows <int>, total_observations <int>, memory_usage <dbl>
Daily cumulative incidence
First, let’s look at the daily cumulative number of incident, lab-confirmed cases for Lima, for all the other provinces combined, and for all of Peru.
The initial increases for Lima and the balance of the other provinces, and for all of Peru look to be approximately sigmoidal, as is expected for epidemic spread. Let’s plot them on a logarithmic y axis. We would expect to see a linear increase on a log scale if the epidemic curve is indeed exponential.
One could convince oneself that log-linearity is present.
Daily incremental incidence
Let’s also look at the daily incremental incidence. This is more informative, and is known in outbreak epidemiological parlance as the epidemic curve. It is traditionally visualised as a bar chart, which emphasises missing data more than a line chart, even one with points as well. We’ll adhere to epidemiological tradition.
At the time of writing (17Apr20), it looks like incidence has yet to plateaued in Peru.
Daily cumulative and incremental deaths in lab-confirmed cases
Now let’s look the daily (incremental) number of deaths in lab-confirmed cases for all provinces in Peru combined.
Let’s also look at the daily incremental deaths in lab-confirmed cases.
Clearly daily counts of deaths are continuing to rise despite the fact that the daily count of new cases is now falling. This is not surprising, given that it takes some time for cases to either recover or to die, and therefore the trend in deaths will necessarily lag behind any trend in daily incidence.
Fitting an SIR model to the Peru data
On 4 February 2020, data science blogger Learning Machines posted this analysis of the COVID-19 outbreak, in which he fitted the classic SIR (Susceptible-Infectious-Recovered) model to the incidence data for all of China. I’ll use his explanation of how to fit this model using R as based for this analysis.
The basic idea behind the SIR model of communicable disease outbreaks is that there are three groups (also called compartments) of people: those who are healthy but susceptible to the disease \(S\), the infectious (and thus, infected) \(I\) and people who have recovered \(R\):
Source: wikipedia
To model the dynamics of the outbreak we need three differential equations, to describe the rates of change in each group, parameterised by \(\beta\) which controls the transition between \(S\) and \(I\) and \(\gamma\) which controls the transition between \(I\) and \(R\):
\[\frac{dS}{dt} = - \frac{\beta I S}{N}\]
\[\frac{dI}{dt} = \frac{\beta I S}{N}- \gamma I\]
\[\frac{dR}{dt} = \gamma I\]
The first step is to express these differential equations as an R function, which is easier than one might think – the expressions in the code are just direct translations of the differential equations, with respect to time \(t\) of course.
SIR <- function(time, state, parameters) {
par <- as.list(c(state, parameters))
with(par, {
dS <- -beta * I * S / N
dI <- beta * I * S / N - gamma * I
dR <- gamma * I
list(c(dS, dI, dR))
})
}To fit the model to the data we need two things: a solver for these differential equations and an optimiser to find the optimal values for our two unknown parameters, \(\beta\) and \(\gamma\). The function ode() (for ordinary differential equations) from the deSolve package for R makes solving the system of equations easy, and to find the optimal values for the parameters we wish to estimate, we can just use the optim function built into base R. Specifically what we need to do is minimise the sum of the squared differences between \(I(t)\), which is the number of people in the infectious compartment \(I\) at time \(t\), and the corresponding number of cases as predicted by our model \(\hat{I}(t)\). This quantity is known as the residual sum of squares (RSS)1.
\[RSS(\beta, \gamma) = \sum_{t} \left( I(t)-\hat{I}(t) \right)^2\]
I now proceed to fit a model to the incidence data for all of Peru. We need a value \(N\) for the initial uninfected population. Once again we’ll scrape population data from a suitable wikipedia page.
The approximate population of Peru in 2017 was 31,237,385 people, according to this wikipedia page.
Next, we need to create a vector with the daily cumulative incidence for Peru, from 6th March when our daily incidence data starts, through to current date. We’ll then compare the predicted incidence from the SIR model fitted to these data with the actual incidence since 6th March. We also need to initialise the values for \(S\), \(I\) and \(R\).
# put the daily cumulative incidence numbers for Peru from 7th Mar to 14th
# Apr into a vector called Infected
sir_start_date <- "2020-03-07"
Infected <- Peru_Cumulative_Incidence %>% filter(province == "Peru_cases_increase",
Date >= ymd("2020-03-07"), Date <= ymd(Sys.Date())) %>% pull(cumulative_incident_cases)
# Create an incrementing Day vector the same length as our cases vector
Day <- 1:(length(Infected))
# now specify initial values for S, I and R
init <- c(S = N - Infected[1], I = Infected[1], R = 0)Then we need to define a function to calculate the \(RSS\), given a set of values for \(\beta\) and \(\gamma\).
# define a function to calculate the residual sum of squares (RSS), passing
# in parameters beta and gamma that are to be optimised for the best fit to
# the incidence data
RSS <- function(parameters) {
names(parameters) <- c("beta", "gamma")
out <- ode(y = init, times = Day, func = SIR, parms = parameters)
fit <- out[, 3]
sum((Infected - fit)^2)
}Finally, we can fit the SIR model to our data by finding the values for \(\beta\) and \(\gamma\) that minimise the residual sum of squares between the observed cumulative incidence and the predicted cumulative incidence. We also need to check that our model has converged, as indicated by the message shown below:
# now find the values of beta and gamma that give the smallest RSS, which
# represents the best fit to the data. Start with values of 0.5 for each,
# and constrain them to the interval 0 to 1.0
Opt <- optim(c(0.5, 0.5), RSS, method = "L-BFGS-B", lower = c(0, 0), upper = c(1,
1))
# check for convergence
Opt$message## [1] "ERROR: ABNORMAL_TERMINATION_IN_LNSRCH"
Convergence is confirmed. Now we can examine the fitted values for \(\beta\) and \(\gamma\).
## beta gamma
## 0.5965422 0.4034585
Those values don’t mean a lot, per se, but let’s use them to get the fitted numbers of people in each compartment of our SIR model for the dates up to 17Apr20 that were used to fit the model, and compare those fitted values with the observed data.
That looks like a reasonably good fit to the observed cumulative incidence data, so we can now use our fitted model to calculate the basic reproduction number \(R_{0}\) which gives the average number of susceptible people who are infected by each infectious person:
\[R_{0} = \frac{\beta}{\gamma}\]
That’s very easy to calculate, and we get:
## R0
## 1.478571
An \(R_{0}\) > 1.5 is consistent the values calculated by others for COVID-19, and is also consistent with the \(R_{0}\) for SARS and MERS, which are similar diseases also cause by coronavirus.
Using the SIR model for Peru to make predictions
An obvious next step is to use our fitted SIR model to make predictions about the future course of the outbreak. However, caution is required, because the SIR model assumes a fixed reproduction number, but if public health interventions have been implemented, such as quarantining of cases, contact tracing and isolation of those contacts, and general restrictions on social mixing, such as closing the country, then the effective reproduction number \(R_{e}\) will be dynamic and should fall as those interventions are progressively implemented, to values considerably less than the basic reproduction number \(R_{0}\), which reflects the behaviour of the virus at the beginning of an epidemic before any response has been implemented.
So let’s use our SIR model, fitted to the first 43 days of data, to extrapolate out to the current date, and compare that against the observed values:
We can see that the actual incidence similar than that predicted by our model. The reason is that, even with public health interventions implemented by the Peruvian authorities, the \(R_{e}\) of the COVID-19 in Peru hasn’t been reduced by much. This is something that will need to be addressed ASAP.
When the \(R_{e}\) falls below 1.0, the peak of the epidemic will have been reached and the outbreak will eventually die out.
Using our model to let the outbreak “run its course” without intervention
It is instructive to use our model fitted to the first 38 days of available data on lab-confirmed cases in Peru, to see what would happen if the outbreak were left to run its course, without public health interventions.
It is easier to see what is going on if we use a log scale:
Clearly that prediction, should it come to pass, would be an unmitigated disaster. At this point, it is worth remarking on the importance of decisive public health intervention to limit the spread of such epidemics. Without such interventions, tens of millions of people could be infected, as our model predicts, and even with only a one or two per cent mortality rate, hundreds of thousands of people would die. The economic and human cost of the outbreak control steps taken within China are large, but the alternatives are a lot worse!
Ascertainment rates
So far, we have assumed that the counts of lab-confirmed cases represent all the cases that are infectious. This is unlikely to be true – typically only a proportion of actual cases are detected or found or sent for testing. This proportion is known as the ascertainment rate. The ascertainment rate is likely to change during the course of an outbreak, particularly if surveillance and screening efforts are increased, or if case definitions are changed. Such changing ascertainment rates can be easily incorporated into our model by using a set of weights, or a weighting function, for our incidence data, but for the sake of simplicity, let’s see what happens if we assume a fixed ascertainment rate of 20%2. If we apply that, thus inflating the number of incident cases by a factor of 5, and refit our model, we get the following results.
## [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
## beta gamma
## 0.5967629 0.4032400
## R0
## 1.47992
Note that these fitted parameters are the same as the ones we got above, without an ascertainment rate adjustment. Let’s look at the fitted values.
Perhaps counter-intuitively, incorporation of a fixed 20% ascertainment rate adjustment into our model makes little difference to the modelled outbreak if it is let run its course, except that it all happens a bit more quickly. But the number of infections remains the same, and the basic reproduction number is unchanged. Note that that is for a fixed ascertainment rate. If the ascertainment rate varies significantly over time, then the parameter estimates will necessarily be biased – but in the early days of an outbreak, it may be reasonable to assume that ascertainment rates don’t change too much.
Epidemic trajectory model using log-linear models
As noted above, the initial exponential phase of an outbreak, when shown in a semi-log plot (the y-axis with a logarithmic transform), looks somehow linear. This suggests that we may be able to model epidemic growth and decay using a simple log-linear model:
\[log(y) = rt + b\]
where \(y\) is the incidence, \(r\) is the growth rate, \(t\) is the number of days since a specific point in time (typically the start of the outbreak), and \(b\) is the intercept. Separate models are fitted to the growth and the decay parts of the epidemic (incidence data) curve.
I proceed to find the peak of confirmed possitive cases based on current available data.
Now I proceed to fit a log-linear model for the growth phase before the peak. I can plot the fitted values from our model (with confidence limits) on top of the actual observed possitive incidence data for Peru.
And here I will evaluate force-of-infection λ plot
| Model Fitted Statistics | |||
|---|---|---|---|
| Last update on: 2020-04-17 | |||
| Dates | R2 | Adj. R2 | Deviance |
| all | 0.91 | 0.90 | 15.84 |
From the model, we can extract various parameters of interest: the growth rate prior to the peak was 0.16 (95% CI 0.15 - 0.18).
This growth rates is equivalent to a doubling time of 4.2 days (95% CI 3.8 - 4.7 days).
The doubling time estimate is quite ilustrative on informing current public health intervention policy in Peru.
Assessment: Outbreak in Peru is not in control yet and may still growth if public health policies are not followed.
Estimating the reproduction number from log-linear models
Based on log-linear model of the epidemic trajectory I’m able to estimate the reproduction number in the growth phase of the epidemic. I need to provide a distribution for the serial interval time, which is the time between the onset of a primary case and the time of onset in its secondary cases. I’ll use a mean of 7.5 days and a standard deviation of 3.4 days to parameterise a discrete gamma distribution for the serial interval based on analysis of just 5 primary cases amongst the first 450 cases in Wuhan, published by Li et al.. Here is a histogram of our calculated distribution of possible values for \(R_{0}\) for the growth phase, based on the log-linear model we fitted to the Peru incidence data:
mu <- 7.5 # days
sigma <- 3.4 # days
param <- gamma_mucv2shapescale(mu, sigma/mu)
w <- distcrete("gamma", interval = 1, shape = param$shape, scale = param$scale,
w = 0)
growth_R0 <- lm2R0_sample(Peru_incidence_fit$model, w)
hist(growth_R0, col = "darkblue", border = "white", main = "R0 distribution in Peru")## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.439 2.660 2.715 2.722 2.806 3.013
Note that the central estimates for \(R_{0}\) are considerably higher than those we calculated with a SIR model fitted to the same Peru data, and are also higher than \(R_{0}\) estimates published elsewhere, but they are consistent with estimates of the instantaneous effective reproduction number \(R_{e}\) which are calculated in the next section3.
Estimating changes in the effective reproduction number
It would be useful to estimate the current effective reproduction number \(R_{e}\) on a day-by-day basis so as to track the effectiveness of public health interventions in Peru, and possibly predict at the earliest opportunity when an outbreak will turn the corner.
There are several available methods for estimating \(R_{e}\) – . Here I will focus on one method, developed in 2013 by Anne Cori and colleagues at Imperial College, London, which permits estimation of the instantaneous effective reproduction number, which is exactly want we want in order to track the effectiveness of containment efforts. Full details are available in the original paper on the method, with extensions described in a later paper by Thompson et al..
Once again, I’ll start with the counts of lab-confirmed posstive cases in Peru from 06th March 2020 onwards.
A critical parameter for the Cori model is the serial interval (SI). The SI is the time between onset of symptoms of each case of the disease in question, and the onset of symptoms in any secondary cases that result from transmission from the primary cases. In other words, it is the time between cases in the (branching) chain of transmission of the disease. A moment’s reflection reveals that the SI is, in fact, a statistical distribution of serial interval times, rather than a fixed value. That distribution can be simulated, typically using a discrete gamma distribution with a given mean and standard deviation.
There are different models that allows uncertainty to be incorporated into the parameterisation of this distribution, and it even allows the SI distribution to be estimated empirically, using Bayesian methods, from individual line-listings of cases. We’ll examine all of those capabilities.
I’m currently using one published estimate of the serial interval distribution, derived from analysis of just 5 primary cases amongst the first 450 cases in Wuhan, published by Li et al.. They estimate the serial interval distribution to have a mean of 7.5 days with a standard deviation of 3.4 days. This is almost identical to the serial interval parameters for the MERS virus, which were a mean of 7.6 and a SD of 3.4. This similarity is not surprising given that both pathogens are coronavirii. Let’s use that to estimate the instantaneous \(R_{e}\) for Peru.
The first thing to note is that the slope of the effective reproduction number curve have yet to show a prominent downward shift, which strongly suggests that containment efforts are yet to succeed in reducing transmission of the disease in Peru.
The second thing to note is that the 7-day sliding window estimates of instantaneous \(R_{e}\) are very high, approaching 15 at the peak. This seems unlikely. It is possible that the Cori model is flawed but has been shown to accurately estimate \(R_{e}\) using a wide range of historical outbreak data. There are, however, a number of possible alternative explanations.
One possible explanation is that COVID-19 is transmissible before the onset of symptoms, resulting in much shorter serial intervals than expected, possibly shorter than the incubation period. Alternatively, and very likely, there may be non-symptomatic, sub-clinical spreaders of the disease, who are undetected. Again, the effect is as if the serial interval is very short, although it would be desirable to explicitly model that scenario, but current methods don’t permit that.
To cover the possibility of some cases transmitting the disease very soon after infection, possibly before the onset of symptoms (so-called super-spreaders), and some cases being sub-clinical, and thus undetected, spreading the disease as well, while other cases have a serial interval more consistent with that of MERS or SARS, with a mean around 8 days, I proceed to incorporate this uncertainty around the serial interval distribution by allowing specification of a distribution of distributions of serial intervals. So let’s retain the mean SI estimated by Li et al. of 7.5 days, with an SD of 3.4, but let’s also allow that mean SI to vary between 2.3 and 8.4 using a truncated normal distribution with an SD of 2.0. We’ll also allow the SD or the SI to vary between 0.5 and 4.0.
Recalculating with those SI distribution parameters and meta-parameters, we get these results:
Analysis above looks more reasonable, and clearly the \(R_{e}\) is falling not that fast in Peru.
Another way to see it will be using empirical estimation of the serial interval based on pair data – that is, unit records of pairs of primary and secondary cases. Li et al. provides such data for just 5 primary cases from the early stages of the outbreak in Wuhan:
Figure 3 from Li et al.
Notice in particular that the secondary cases in cluster 4 have very short serial intervals, which supports the meta-distribution we explored above (albeit, that set of short intervals is for only one primary case, but that is the only serial interval currently available, it seems!).
Let’s use those serial interval data to re-estimate \(R_{e}\). Bayesian methods are used, and the trace output below is from the MCMC (Markov-chain Monte Carlo) resampling methods used.
## Running 1500 MCMC iterations
## MCMCmetrop1R iteration 1 of 1500
## function value = -30.27974
## theta =
## 1.40008
## 0.46471
## Metropolis acceptance rate = 0.00000
##
## MCMCmetrop1R iteration 1001 of 1500
## function value = -31.54334
## theta =
## 0.52339
## 1.20517
## Metropolis acceptance rate = 0.55844
##
##
##
## @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
## The Metropolis acceptance rate was 0.56400
## @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
##
##
## @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
## Estimating the reproduction number for these serial interval estimates...
## @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
That is remarkably similar to our estimates based on an “uncertain” meta-distribution for the serial interval, above, which is encouraging.
Nonetheless some of the \(R_{e}\) values are still rather high. Remember, these are instantaneous reproductive numbers, averaged over a sliding 7-day window. Let’s change that to a shorter window to see what the day-to-day changes in \(R_{e}\) are, rather than based on an aggregation of the preceding 7-days. I’ll plot the results using a logarithmic scale with a red reference line at 1.0, representing the reproduction number below which the outbreak will start to die out.
## Running 1500 MCMC iterations
## MCMCmetrop1R iteration 1 of 1500
## function value = -30.27974
## theta =
## 1.40008
## 0.46471
## Metropolis acceptance rate = 0.00000
##
## MCMCmetrop1R iteration 1001 of 1500
## function value = -31.54334
## theta =
## 0.52339
## 1.20517
## Metropolis acceptance rate = 0.55844
##
##
##
## @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
## The Metropolis acceptance rate was 0.56400
## @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
##
##
## @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
## Estimating the reproduction number for these serial interval estimates...
## @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
So, it looks like the outbreak has yet to been brought under control in Peru, at least based on the published lab-confirmed possitive case numbers and the available serial interval data or estimates of its distribution.
It is clear that control measures have yet to work by looking at the daily incremental incidence numbers for Peru:
Projections
After creating projections based on a simple SIR model, it is possible to use a more complex model to generate a sophisticated projection as well.
Here below I created a projection based up to 10 April 2020 data. Projection extends up to 14 days from last observed possitive case.
Based on that projection, it will take until May for the outbreak in Peru to be extinguished.
It is also possible to fit SIR and related models by MLE.↩︎
In a very interesting paper, Nishiura et al. state: “There were two interesting features of the evacuation process conducted by Japanese authorities. First, all 565 passengers [repatriated to Japan from Wuhan] were screened for symptoms upon arrival. Prior to disembarkation, the quarantine officers used portable thermoscanners to screen the temperature. In the following, all passengers were interviewed regarding whether they have any symptoms suggestive of upper respiratory tract infection, including fever and cough. Of the 565 passengers, 63 were symptomatic. Second, the passengers were tested for presence of 2019‐nCoV using reverse transcription polymerase chain reaction (RT‐PCR), and eight passengers (1.4%) were determined to have the virus. Importantly, most of the individuals positive for the virus were asymptomatic (five passengers), while only the other three had symptoms consistent with 2019‐nCoV infection, indicating that the dataset generated from this process can help investigate the full spectrum of infection including mild and asymptomatic infections.”. Thus, they found about 60% of those infected with the COVID-19 virus were essentially asymptomatic, and thus likely to be undetected and unascertained.↩︎
They are also consistent with unpublished estimates of \(R_{0}\) for the COVID-19 virus being discussed in various expert in WHO.↩︎